CMDA 4654 Project 2

Julia Brady, Priya Bhat, Mason Colt, Matt Nissen, Dylan Fair, Jamal Mani

2025-11-22

What is Least Angle Regression?

Why Do We Need LARS?

Forward Selection Problems:

LARS Fixes This:

Key Idea

LARS moves in the direction that forms equal angles with all predictors most correlated with the residual.

Setup and Notation

We model:

\[ \mu = X\beta \]

At any step:

The Equiangular Direction

To update the model, LARS finds a vector \(u_A\) such that:

Mathematically:

\[ u_A = X_A w_A, \quad w_A = \frac{G_A^{-1} 1_A}{\sqrt{1_A^\top G_A^{-1}1_A}} \]

This ensures a balanced movement among active predictors.

Updating the Model

The fitted values update via:

\[ \mu \leftarrow \mu + \gamma\, u_A \]

Where the step size \(\gamma\) is the largest value such that:

Thus the active set expands exactly when it should.

Algorithm Steps

Initial Equation: \(y = \beta_{0}\) Final Equation: \(y = \beta_{0} + \beta_{1}x_1 + \beta_{2}x_2 + ... + \beta_{n}x_n\)

  1. Take the correlation of residuals with every predictor variable and find the maximum.
  2. Add the highest-correlation predictor \(x_j\) to the equation with coefficient \(\gamma\): \(r = y- \beta_0 - \gamma_j x_j\), \(y = \beta_0 + \gamma_j x_j\)
  3. Increase \(\gamma\) in the direction of its correlation with y (positive or negative), taking residuals along the way, until \(Cor(x_j, r) = Cor(x_k, r)\) for some other predictor \(x_k\).
  4. Continue moving along \((x_j, x_k)\) by increasing \((\gamma_j, \gamma_k)\) in their least squares direction, taking residuals along the way, until some predictor \(x_m\) has as much correlation as the residual.
  5. Repeat this until all predictors are included in the model.

Algorithm

  1. Take the correlation of residuals with every predictor variable and find the maximum.

At the beginning of the algorithm, we set our intercept \(\beta_0\) equal to the average of our y-vector, denoted by \(\bar y\). Residuals, then, are given by \(r = y - \beta_0 = y - \bar y\)

We can denote the correlation of residuals with each variable as \(Cor(r, \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}) = \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix}\)

Let \(max \{ {\begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix}} \}=c_{max}\).

We select the predictor \(x_j\) corresponding to \(c_{max}\) as our first active predictor.

Algorithm

  1. Add the highest-correlation predictor \(x_j\) to the equation with coefficient \(\gamma\):

\(r = y- \beta_0 - \gamma_j x_j\)

\(\gamma_j\) is a temporary step size along predictor \(x_j\). We will compute its value and assign it to \(\beta_j\).

Algorithm

  1. Increase \(\gamma\) in the direction of its correlation with y (positive or negative), taking residuals along the way, until \(Cor(x_j, r) = Cor(x_k, r)\) for some other predictor \(x_k\).

The direction of correlation is given by \(u\), a unit vector equal to \(x_j\). The residual vector, constantly being updated as \(\gamma\) increases, is given by \(r(\gamma)\).

\(r(\gamma) = r_0 - \gamma u\), where \(r_0 = y - \bar y\).

To find \(\gamma_j\), consider all predictors not added to the model yet. Solve

\(C = Cor(x_j, r(\gamma))=Cor(x_i, r(\gamma))\). This will give us \((n-1)\) values of \(\gamma\) for \(n\) predictive variables.

The minimum \(\gamma\) value is our \(\beta_j\). The corresponding predictor is the next active predictor, which we will add in the next step and denote \(x_k\).

Now, we have \(y = \beta_0 + \beta_j x_j\).

Algorithm

  1. Continue moving along \((x_j, x_k)\) in their least squares direction, taking residuals along the way, until some predictor \(x_m\) has as much correlation as the residual.

Because we now have 2 predictors, our direction vector \(u\) will be updated according to both. Also, we will move \(\gamma_k\) to update our fitted values and residuals.

Then, for every predictor we’re not already using, we find the minimum \(\gamma\) value again. It will update each of our coefficients and the corresponding predictor \(x_m\) will be added to the model next.

Now, we have our updated \(\beta_k\), and our equation becomes \(y = \beta_0 + \beta_j x_j + \beta_k x_k\).

Algorithm

  1. Continue moving along \((x_j, x_k, x_m)\) in their least squares direction, taking residuals along the way, until some predictor \(x_p\) has as much correlation as the residual vector.

Once again, our direction vector \(u\) is updated according to all the predictors, and we will move \(\gamma_m\) to update both our fitted values and residuals.

We find the minimum \(\gamma\) value among all non-active predictors, update our coefficients, and we will add \(x_n\) to the next iteration of the model.

  1. We will repeat this until all predictors are included in the model.

This movement is piecewise-linear and geometrically optimal.

Geometric Picture

When LARS is Appropriate

Real-World Examples: LARS is Appropriate

When LARS is NOT Appropriate

Real-World Examples: LARS is NOT Appropriate

LARS using iris Dataset

LARS requires us to have:

y <- iris$Sepal.Length
x <- as.matrix(iris[, c("Sepal.Width", "Petal.Length", "Petal.Width")])

Model Summary

## 
## Call:
## lars(x = x, y = y, type = "lar")
## R-squared: 0.859 
## Sequence of LAR moves:
##      Petal.Length Sepal.Width Petal.Width
## Var             2           1           3
## Step            1           2           3

iris LARS Plot

We can see that Petal.Length has the strongest correlation with Sepal.Length. Sepal.Width has the 2nd strongest correlation, and Petal.Width is the least correlated.

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

LARS using mtcars Dataset

In this model, we are looking to predict fuel efficiency:

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Model Summary

## LARS/LAR
## Call: lars(x = x_val, y = y_val, type = "lar")
##    Df     Rss       Cp
## 0   1 1126.05 130.3246
## 1   2  992.54 113.3155
## 2   3  378.79  27.9310
## 3   4  194.17   3.6454
## 4   5  190.76   5.1607
## 5   6  184.29   6.2386
## 6   7  170.09   6.2177
## 7   8  169.29   8.1030
## 8   9  157.32   8.3992
## 9  10  151.71   9.6000
## 10 11  147.49  11.0000

mtcars LARS Plot

Here we see that wt and cyl enter the model first, meaning they are more strongly correlated to mpg than other variables such as disp and qsec.

Key Findings

Comparing LARS to Other Regression Methods

Iris: LARS vs Other Regression Methods

We predict Sepal.Length from the three other numeric iris variables and compare four models:

Interpretation:

Model Test_MSE
OLS 0.079
Stepwise 0.079
Lasso 0.079
LARS 0.079

Iris: Lasso Cross-Validation Curve

Lasso selects its penalty parameter λ using cross-validation.

Iris: LARS Coefficient Paths

The LARS coefficient path shows:

mtcars: LARS vs Other Regression Methods

We predict mpg in the mtcars dataset using all other variables:

Interpretation:

Model Test_MSE
OLS 5.5893
Stepwise 5.5515
Lasso 5.3049
LARS 5.5410

Takeaways from Baby Datasets

Main Dataset

This project uses the Burke et al. (2022) global urban soil black carbon dataset, obtained from the Knowledge Network for Biocomplexity (KNB) at: https://knb.ecoinformatics.org/view/urn:uuid:1651eeb1-e050-4c78-8410-ec2389ca2363

The dataset pulls together measurements of black carbon in urban soils from cities around the world. Each row includes details like latitude/longitude, elevation, precipitation, soil temperature at different depths, land-cover type, population info, and notes from the original studies. The main sheet (“Urban Black Carbon”) contains 600+ observations and about 65 variables, giving us a wide mix of environmental and geographic predictors.

Because many of these variables move together (climate, location, soil traits, etc.), the dataset naturally has clusters of correlated features, which makes it a solid fit for demonstrating Least Angle Regression (LARS).

Data Dictionary

We removed variables with 90+% missing values to avoid unstable predictors and ensure consistent sample size across all variables. This threshold preserved essential environmental predictors while excluding sparse fields that contained too little information to contribute to modeling.

BC vs Depth

## `geom_smooth()` using formula = 'y ~ x'

Takeaways

Predictor Correlation Heatmap

Takeaways

LARS on Large Data

Test-Set Performance of the LARS Model
Model Test_MSE
LARS (Cp-selected) 153.7056

Note: On held-out soil samples, the model’s predictions differ from the observed black carbon values by about 12.4 mg/g on average

This level of error is expected for this dataset, because black carbon concentrations are extremely variable near the soil surface.

LARS Coefficient Path

The coefficient path illustrates:

Interpretation

Interpretation

References

Lucero, Christian. “Model Selection.” CMDA-4654: Intermediate Machine Learning & Data Analytics, Lecture 15, Virginia Tech, Fall 2025. https://en.wikipedia.org/wiki/Least-angle_regression https://www.geeksforgeeks.org/machine-learning/least-angle-regression-lars/ https://tibshirani.su.domains/ftp/lars.pdf https://projecteuclid.org/journals/annals-of-statistics/volume-32/issue-2/Least-angle-regression/10.1214/009053604000000067.full https://link.springer.com/article/10.1007/s41237-024-00237-2 https://bookdown.org/yihui/rmarkdown-cookbook/figure-size.html https://bookdown.org/yihui/rmarkdown/ioslides-presentation.html https://statisticseasily.com/glossario/what-is-least-angle-regression-lars/